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An Exact Dual Adjoint Solution Method 
for Turbulent Flows on Unstructured Grids 


Eric J. Nielsen*, James Lu^, Michael A. Park^, and David L. Daimofal^ 

An algorithm for solving the discrete adjoint system based on an unstructured -grid discretization of the 
Navier-Stokes equations is presented. The method is constnuted such that an adjoint solution exactly dual to a 
direct differentiation approach is recovered at each time step, yielding a convergence rate which is asymptotically 
equivalent to that of the primal system. The new approach is implemented within a three-dimensional unstruc- 
tured-grid framework and results are presented for inviscid, laminar, and turbulent flows. Improvements to the 
baseline solution algorithm, such as line-implicit relaxation and a tight coupling of the turbulence model, are also 
presented. By storing nearest-neighbor terms in the residual computation, the dual scheme is computationally 
efficient, while requiring twice the memory of the flow solution. The scheme is expected to have a broad impact 
on computational problems related to design optimization as Avell as error estimation and grid adaptation efforts. 


Introduction 

The field of computational fluid dynamics (CFD) is increasingly 
important in the design and validation of new aerodynamic con- 
cepts. The use of computational tools can greatly reduce the need 
for more costly alternatives, such as wind tunnel experiments or 
flight testing. CFD techniques based on the Navier-Stokes equa- 
tions have matured into reliable tools for the analysis of complex 
geometries. However, design optimization using these high-fidel- 
ity methods has been greatly inhibited by their relatively high ex- 
pense and lack of robusmess. 

In an effort to alleviate the high cost of gradient-based design 
methodologies, recent work has focused on the use of adjoint 
methods. Adjoint techniques generally fall into one of two cate- 
gories based on the order in which the discretization and differenti- 
ation processes are performed. The two approaches are termed 
continuous or discrete adjoint methods. For a single output or con- 
straint function, these schemes allow the computation of design 
sensitivities at the cost of solving a single additional linear prob- 
lem and a subsequent computation of a matrix -vector product di- 
mensioned by the number of design variables. 

In addition to its role in design optimization problems, the solu- 
tion of the adjoint equation can be used to provide error estimates 
for an output of engineering interest, as well as grid adaptation in- 
formation that can be used to improve solution accuracy. Tra- 
ditional methods for grid adaptation have typically relied on ad 
hoc feature-based quantities. In these approaches, features 
such as shocks, boundary layers, and other regions of interest are 
characterized by large gradients or curvatures in the solution. The 
adaptation algorithm then attempts to improve the solution by add- 
ing additional grid points in these regions. Unfortunately this ap- 
proach can yield grids of unwieldy dimensions and even incorrect 
results. The local flow feature of interest is often over-resolved, 
while smooth regions of the flowfield are essentially neglected. 

The adjoint approach to grid adaptation seeks to minimize the 
uncertainty or error in some specified output function. In this ap- 
proach, a local adaptation parameter is obtained by combining 
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flow and adjoint solutions, where the nonlinearities in these solu- 
tions are weighted with the local residual error. This adaptation 
technique implicitly targets the flow features having the highest 
impact on the output of interest. The test cases examined in Refs. 
24, 25, and 27-29 clearly demonstrate the potential of using such 
an approach to grid adaptation. Comparisons with feature-based 
techniques are shown and results equivalent to those of uniform 
g] id-refinement studies are obtained at a fraction of the cost. The 
process terminates when a user-specified error tolerance is 
achieved. 

Unfortunately the solution of the adjoint system of equations for 
realistic problems has proven to be a formidable task. In Ref. 4, a 
Krylov method was used to solve the adjoint system for turbulent 
flows using relatively coarse grids over simple geometries. This 
work successfully demonstrated the accuracy of the method. Mow- 
er er, the computational time required to solve the equations was as 
much as ten times the cost of the analysis problem, and the scheme 
failed to converge for many problems. By employing a more ex- 
tensive preconditioner for the Krylov algorithm. Ref. 5 demon- 
strated improved performance. However, this preconditioning 
strategy was shown to require approximately five times the mem- 
ory of the baseline analysis scheme. This approach proved infeasi- 
ble on available computers for large-scale problems that require 
grids containing several million mesh points. 

The focus of the current work is to develop a new solution algo- 
rithm for the discrete adjoint system described in Refs. 4 and 5. 
Tlie work is largely based on the recent contributions of Giles, 
in which adjoint solutions for the Euler and Navier-Stokes equa- 
tions are computed by using an explicit Runge-Kutta scheme com- 
bined with multigrid using an exact dual method. Elliott^"* recently 
used a similar approach for two-dimensional turbulent flows on 
os erset structured grids using multigrid and approximate factoriza- 
tion. 

Improvements to the implicit solution technique of Refs. 35 and 
3o are described, and an exact dual scheme is developed for the re- 
sulting algorithm. Care has been taken to ensure that the imple- 
mentation yields identical values for linear functionals at each time 
step, thereby guaranteeing identical asymptotic convergence rates 
for the primal and dual systems. Results are shown for several 
small test problems and two large-scale configurations. Conver- 
gence of a typical cost function and its derivatives are examined, 
and computational speed and memory requirements are discussed. 

Flow Solution Method 

The governing equations are the three-dimensional Reynolds- 
a^eraged Navier-Stokes equations. For turbulent flows, the one- 
equation model of Spalart and Allmaras-^® is used. The flow solver 
used in the current work is described at length in Refs. 1,35, and 
3?). The code uses an implicit, upwind, finite-volume discretization 
in which the dependent variables are stored at the mesh vertices. 



Inviscid fluxes at cell interfaces are computed by using the upwind 
schemes of Roe^"^ or Van Leer."^^^ Viscous fluxes are formed by 
using an approach equivalent to a central-difference Galerkin pro- 
cedure. For steady-state flows, temporal discretization is performed 
by using a backward Euler time-stepping scheme. A highly scalable 
parallelization scheme is achieved through domain decomposition 
and MPI communication. 

An approximate solution of the linear system of equations 
formed at each time step is obtained through several iterations of a 
point-iterative scheme in which the nodes are updated in an even- 
odd fashion, resulting in a Gauss-Seidel-type method. This scheme 
is augmented with a line-relaxation algorithm in the current work. 

In Refs. 1-5, 35, and 36, the turbulence model is integrated all 
the way to the wall without the use of wall functions, and is solved 
separately from the flow equations at each time step with an identi- 
cal time-stepping scheme. The resulting linear system is solved 
with the same iterative scheme employed for the flow equations. 
The impact of coupling the flow equations and turbulence model 
will be addressed further in a subsequent section. 

In Refs. 1, 4, and 5, a discrete adjoint capability has been devel- 
oped for the solver. In these references, the discretization of the 
flow equations and turbulence model described above has been 
fully differentiated by hand, and the adjoint system of equations has 
been solved by using a preconditioned GMRES^^ algorithm. The 
focus of the current work is an alternate solution method based on 
an exact dual formulation. 

Tightly Coupled Turbulence Model 

The loosely coupled implementation of the turbulence model in 
the baseline solver was originally chosen for its convenience in ac- 
commodating additional models as they became available. Further- 
more, the loose formulation allows for straightforward implicit en- 
forcement of positivity on the eddy viscosity by using M-type 
matrices and positive operators as described in Ref. 38. 

Although the loose formulation has proven satisfactory for engi- 
neering-level analysis, it often results in stalled convergence or 
limit-cycle oscillations that can be detrimental to subsequent ad- 
joint computations. In the current work, a tightly coupled algorithm 
is used to obtain more robust convergence behavior. The scheme 
includes the linearizations of the governing flow equations with re- 
spect to the turbulence model dependent variable \) , as well as the 
linearizations of the turbulence model with respect to the conserved 
variables Q , in the computation of the solution updates AQ and 
A\> . The approach taken in Ref. 38 to guarantee positivity on v be- 
comes prohibitively difficult to impose for the tightly coupled sys- 
tem, so that an update clip at \) = 0 is necessary to preclude non- 
physical behavior as the solution rapidly develops from its initial 
freestream conditions. This procedure occasionally admits tran- 
sients in the early stages of a computation which may result in di- 
vergence of the solution; these transients are overcome by using 
small time steps as the initial solution sets up. Although not dis- 
cussed in the present paper, limited success has been achieved 
through modifications as suggested in Ref, 4 1 , in which an analysis 
of the linearized form of the turbulence source terms suggests a 
similar addition to the diagonal elements of the system. 

To demonstrate the effect of coupling on solution convergence, 
transonic flow over an ONERA M6 wing**^ is computed by using 
both the loosely and tightly coupled formulations. The grid is 
shown in Fig. 1 and contains 359,536 nodes and 2,074,955 tetrahe- 
dra. The freestream Mach number is 0.84, the angle of attack is 
3.06® , and the Reynolds number is 5 million based on the mean 
aerodynamic chord. For both cases, the CFL number for the flow 
and turbulence equations has been linearly ramped from 10 to 200 
over the first 50 iterations. The convergence histories for density 
and turbulence for the two solution methods are shown in Fig. 2. 
The loosely coupled scheme results in stalled convergence, whereas 
the tightly coupled scheme steadily reduces the residuals by six or- 
ders of magnitude over the course of the computation. Similar re- 
sults have been observed in other cases, and the tightly coupled al- 
gorithm is employed for the remainder of the current work. 



Figure 1. Surface grid for viscous ONERA M6 uing. 



Figure 2. Effect of turbulence model coupling on ONERA M6 flow 
solution. 

Line-Implicit Relaxation Scheme 

When used in conjunction with the baseline point-implicit 
scheme used for the flow solution, a line-implicit scheme can result 
in improved convergence rates for viscous flows. The benefits of 
line-relaxation techniques on highly stretched grids are well- 
known."*^ The central idea is that the coupling of the discrete equa- 
tions is considerably higher in the direction normal to the grid 
stretching, so that a relaxation scheme can be constructed to more 
efficiently solve the equations associated with lines in these direc- 
tions. 

Line Construction Method 

On Structured grids the line-relaxation direction is typically well- 
defined because a grid coordinate direction can be readily em- 
ployed. For unstructured grids, this inherent direction is not avail- 
able and must be constructed prior to performing any computations. 
In Ref 44, implicit lines have been constructed based on neighbor- 
ing prismatic and hexahedral elements within the boundary layer. 
In the current work, only tetrahedral elements are employed; thus, 
an alternative line construction strategy must be used. 

To construct a line for implicit relaxation through the boundary 
layer, an initial point is chosen on a viscous boundary surface. A 
surface normal is constructed at this point, and the direction of 
edges connected to this node are compared with the normal using 
an inner product. The edge with the maximum positive inner prod- 
uct is selected to form the first line segment. The surface normal is 
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Figure 3. Result of implicit line construction algorithm applied to 

wingtip geometry of Ref. 45. 

then replaced with the direction of the chosen edge, and the process 
is repeated at the endpoint of successively selected edges, forming 
the line along which to relax. The process is terminated when the 
ratio of the length of the longest edge connected to the current node 
to that of the selected edge falls below a predefined value; a value 
of 5 is used in the current work. This criterion serves to identify the 
"inviscid" region of the grid, where elements revert to an isotropic 
distribution. Construction of a line is also terminated if an edge di- 
rection differing by less than 20 degrees from the previous segment 
cannot be found. This second stopping criterion ensures that line 
segments remain normal to the original boundary surface. The re- 
sult of applying this line construction algorithm to a simple wingtip 
geometry*’ is shown in Fig. 3. A direction suitable for line-implicit 
relaxation through the boundary layer has been formed at each 
boundary node, such that approximately 30 grid points are con- 
tained in each line. 

Another important consideration is the mesh partitioning strategy 
for parallel processing. To efficiently perform relaxation along any 
given line, the line should wholly lie within a grid partition. To 
avoid partition boundaries cutting across relaxation lines, edge 
weighting is employed in the partitioning phase. This procedure 
minimizes the number of implicit lines that are split across partition 
boundaries. In the event that an implicit line is cut by the mesh par- 
titioning, the line is terminated at the partition boundary; no attempt 
is made to perform line-relaxation across processors. Finally, a 
multiconstraint version of the partitioning algorithm"*^ is used to en- 
sure that load balancing is achieved within the line-implicit and 
point-implicit regions of the grid. 

Line Relaxation Algorithm 

Once the linear system of equations at the current time step has 
been assembled, the unknowns associated with each implicit line 
are computed exactly by using Gaussian elimination of a local 
block-tridiagonal system of equations. This procedure is repeated 
for a series of sweeps over the implicit lines; the initial decomposi- 
tion of the coefficient matrix is stored so that subsequent sweeps 
merely consist of a forward/backward substitution procedure. Once 
a suitable level of convergence is obtained for the unknowns, the 
remainder of the domain is relaxed by using several sweeps of the 
baseline point-implicit scheme; adjacent unknowns determined by 
the line-implicit scheme are taken as known quantities and moved 
to the right-hand side of the equations. 

To demonstrate the benefits of the line-implicit scheme, fully tur- 
bulent flow over the wing-body configuration"*' shown in Fig. 4 is 
computed on twenty-two 2.2 GHz Pentium IV processors using the 
baseline point-implicit scheme as well as the line-implicit algo- 
rithm. The freestream Mach number is 0.75, the angle of attack is 
0® , and the Reynolds number is 3 million based on the mean aero- 
dynamic chord. The grid contains 1,641,452 nodes and 9,650,684 


Figure 4. Surface grid for configuration of Ref. 45. 



Figure 5. Residual convergence histories for point- and line-implicit 
solutions. 


tetrahedra, and the line construction algorithm places 1,069,238 
nodes in the line-implicit region. For this test, 15 sweeps through 
the line- and point-implicit regions are used for both computations. 

Convergence of the density and turbulence residuals for both so- 
luiion schemes is shown in Fig. 5. The line-implicit strategy ulti- 
mately results in a computational savings of approximately 20% 
o\er the baseline algorithm. As shown in Fig. 6, lift and drag more 
rapidly approach their steady-state values with the line-implicit 
scheme. This behavior is attributed to the rapid development of the 
b(^undary layer region, and has been observed in a number of cases. 
For comparison, the results are plotted in Fig. 7 against experimen- 
tal values from Ref. 45. Similar to many of the computations re- 
ported in Ref. 48, the lift is slightly overpredicted; however, the 
correlation with the experimental drag polar is good. 

Derivation of the Dual Algorithm 

Consider the following form of the steady-state nonlinear gov- 
erning equations, where D and g represent the vector of design 
variables and the corresponding dependent variables, respectively, 
and R 2 represents a second-order accurate discretization of the 
spatial residual. 

R2iQ.D) = 0. 0 ) 

Note that in practice, there is also an implicit dependency on the 
computational grid; the current approach includes these terms, 
however they are omitted here to simplify the underlying analysis. 
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( 3 ) 




fV, BRA n 

where A0" = - Q'\ Note the iteration superscript n is omit- 
ted from the Jacobian matrix for clarity; unless otherwise 

noted, these derivatives will all be evaluated at time step n . 

Note that because the Jacobian used in Eq. 3 is an exact lineariza- 
tion of /? 2 , then for an infinite time step A/ . the iterative scheme 
corresjx)nds to Newton's method, and will converge qua- 

dratically to the steady-state value Q* corresponding to D . How- 
ever, the use of an exact Jacobian dR 2 /dQ is often prohibitively 
expensive for realistic problems, so that an approximate form of 
Eq. 3 is typically used: 

where the exact Jacobian has been replaced with a linearization of 
some first-order approximation to the spatial residual. The draw- 
back to such an approach is that the asymptotic convergence of the 
algorithm becomes linear in nature. 

Equation 4 is a linear system of equations for AQ " , which in 
principle can be solved exactly. In practice, however, the system is 
usually solved approximately by using an iterative method. There- 
fore, Eq, 4 can be restated as 

AQ^-^PR.(Q\D) = 0 , ( 5 ) 

where P is an approximation to the inverse of V/Atl + dR^/dQ , 
typically achieved through some iterative scheme such as a Gauss- 
Seidel iteration. In this manner, the asymptotic rate of convergence 
is governed by the largest eigenvalue of the matrix I - . 

Direct Differentia tion Algorithm 

With minor modifications, the iterative algorithm outlined above 
can be used to determine the sensitivity derivatives of an output 
function / = /(g*, D) with respect to a design variable. Applica- 
tion of the chain rule yields the following: 



Figure 7. Comparison of computed force coefficients with 
experimental data of Ref. 45, 


df _ df dQ' df 
dD dQ’dD * dD 


( 6 ) 


Linearizing Eq. 1 with respect to D gives the linear residual equa- 
tions for the sensitivity derivatives dQ*/dD : 


dR^ 

ae*ao ao “ ® 


(7) 


Applying the algorithm of Eq. 3 to this expression gives an iterative 
scheme for dQ*/dD : 


U/^dgMa/)] ■'a^ldsj 


dR. 

+ = 0 • 
BD 


( 8 ) 


Finally, by replacing the exact Jacobian with the approximate Jaco- 
bian as before and performing an approximate inverse gives the 
final form of the direct differentiation iterative algorithm for the 
sensitivity derivatives BQ*‘/BD : 


An iterative algorithm based on a backward Euler integration 
scheme for the solution of the nonlinear system of equations given 
by Eq. 1 can be written as 

= 0 . a) 

where V and At represent the local cell volume and time step, re- 
spectively, and g is the vector of dependent variables at time step 
n . The nonlinear iterative scheme is attained by linearizing the re- 
sidual about time step n : 



+ P 


rd«j 


Lde*' 




= 0. 


(9) 


Note the asymptotic rate of convergence is again dictated by the 
largest eigenvalue of / - PBR 2 /BQ , and therefore will be equiva- 
lent to that of the scheme used to attain g* . After N iterations of 
Eq. 9, the derivative of the output / with respect to the design vari- 
ables D may be approximated as 


iL = I 

dD dQ\hD) dD' 


( 10 ) 
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Exact Dual Algorithm 

In this section, an iterative solution of the dual problem is con- 
structed in a manner which guarantees the functional estimate is 
identical at every iteration to that obtained from the direct differen- 
tation algorithm. Using an adjoint approach, the sensitivity deriva- 
tive of the output can be written as 


= 

dD 


rdR. 



( 11 ) 


or 

-x^ = (19) 

where A = M -N . Here, M is some matrix that is easily invert- 
ible and approximates A . 

C'lven some initial estimate x , after J iterations the scheme re- 
sults in the following: 

jr'-jr® = (20) 


where A is an adjoint variable that satisfies 


Since Eq. 12 is independent of the vector D , the adjoint approach 
is particularly attractive for aerodynamic design problems with a 
large number of design variables and relatively few constraints. By 
examining Eqs. 7 and 12, it can be seen that 



dQ*dD dD 


(13) 


Therefore, with respect to Eq. 9, the approximate inverse P using 
J iterations of the fixed-point method is 

P = . (21) 

Forming the transpose of P yields 

P^ = 1-A^^\(M 'N)\ . ( 22 ) 

Bei ase = I-M~'a , then 

P^ = 1-A^''\(I-M~'A)] . 


Therefore, the values of df /dD calculated from Eqs. 6 and 1 1 are 
identical. However, since an iterative algorithm is used to estimate 
dQ*/dD, this relationship will not hold in general. Following 
Giles' exact dual approach, an iterative adjoint solution algorithm is 
sought which satisfies 


dQ\dD ) 




(14) 


where N is some iteration at which the equality is to be enforced. 
However, the resulting exact dual algorithm will not depend on N , 
and the expression given by Eq. 14 will be valid at every iteration. 
In this sense, the adjoint iterative algorithm will be exactly dual to 
the direct differentiation algorithm. 

Introducing Lagrange multipliers, the left-hand side of Eq. 14 
after N iterations can be written as 




df ^ df (15) 

dQ\dD ) dQ\dD ) 

rdR^(dQ\'' dR.l 


n = a 

Then, by defining 




A" = ^ 


(16) 


and performing the manipulations shown in the appendix, the exact 
duality condition requires that 


■ A" + 




= 0 


(17) 


with an initial condition A® = 0 . As mentioned previously, Eq. 17 
is independent of N and thus the exact duality condition holds for 
all values of N . This condition guarantees an asymptotic rate of 
convergence for the dual problem which is governed by the largest 
eigenvalue of I - P^ldR./dQ!^ and is therefore ultimately 
equivalent to that of the solution for g* . 


Relationship Between P and P^ 

In the current work, the approximate inverse P is the result of 
applying multiple iterations of a fixed-point method, which can be 
written in the following manner: 

(18) 


or 


P^ = I- [M~^N^]^A~^ . (23) 

If tlie dual problem takes the form A^y = g as in Eq. 17, then 
analogous to Eq. 20, Eq. 23 implies that the fixed-point iterative 
scheme for the dual problem is 

-y^ = M~^{g - A V) ■ (24) 

For a point-Jacobi scheme, the transpose operator applied to the 
iteration matrix M is of no consequence, other than to imply that 
the diagonal blocks are to be transposed. However, for a Gauss- 
Seidel iterative method such as the one used in the current work, 
the transpose operation transforms an upper-triangular iteration ma- 
trix into a lower-triangular form, and vice versa. Therefore, the 
transpose implies that even-odd sweeps through the system of 
equations must be performed in a reverse manner. The same argu- 
ment holds for the line-implicit operator used to relax the boundary 
la> er region, although the solution within each line can be obtained 
in the same manner as the baseline scheme because the inversion is 
exact for the set of local equations within the line. 

Implementation of the Dual Algorithm 

The majority of the effort associated with developing an adjoint 
solver is in forming the exact linearizations of the second-order re- 
sidual. For the codes used in the current study, this differentiation 
has been previously performed and the accuracy has been estab- 
lished in Refs. 1, 3, and 5. The implementation of the exact dual al- 
gC'rithm is primarily a high-level task that involves the manipula- 
tion of existing routines; this process is made easier through the 
programming practices established in Ref. 49. 

After the flow solution has completed a series of n time steps 
using the algorithm given by Eq. 5, the adjoint solve can be per- 
formed in an analogous fashion according to Eq. 17. Note that the 
dual iterative scheme is essentially identical to the baseline analysis 
scheme so that little extra coding is required. 

According to Eq. 17, the exact linearization of the second-order 
spatial residual is required for the adjoint computation. However, 
since these terms only appear in a matrix- vector product, they can 
be formed on a term-by-term basis and therefore do not require the 
extensive storage typically associated with the higher-order stencil. 
The transpose of the first-order Jacobian matrix 
(V/At)I + dRx/dQ* is used to solve the system of adjoint equa- 
tions, so that the storage required by the exact dual scheme is 
roughly equivalent to that of the baseline analysis code. This re- 
quirement is in contrast to the solution algorithm outlined in Ref. 5, 
in which the complete linearization of the second-order residual 
was used as a preconditioner. The terms arising from 
ti l is larger stencil resulted in a memory requirement approximately 
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Figure 8. Elensity residual histories for inviscid and laminar flow 
over ONERA M6 wing. 


five times that of the scheme outlined in Refs. 35 and 36. Finally, 
since the Jacobian matrix is constant for an adjoint com- 

putation, the block and tridiagonal -block decompositions used for 
the point- and line-implicit iterations need only be performed once 
and stored. 

The solution strategy outlined in Refs. 1 and 4 required a fre- 
quent computation of the form [3/f2/3g*]^AA for the Krylov 
method being used. The viscous terms arising from the nearest- 
neighbor discretization were stored, so that only the inviscid terms 
involving the larger stencil were recomputed for each new Krylov 
vector. In the current work, an analogous approach can be used in 
the formation of the adjoint residual component if 

additional memory is available. Furthermore, if sufficient memory 
is available for the higher-order terms as used for the precondi- 
tioner in Ref. 5, the complete linearization may be stored. In this 
manner, the residual update at time step n reduces to an explicit 
matrix-vector product. Since all terms comprising the residui are 
stored in this approach, the solution time for the adjoint problem 
could be reduced considerably. Solution times and memory require- 
ments for each of these strategies will be shown in a subsequent 
section. 

Validation Cases 

A series of small test cases involving inviscid, laminar, and tur- 
bulent flow over the ONERA M6 wing geometry is presented to 
verify that the exact dual algorithm has been implemented cor- 
rectly. Each of the viscous tests has been run using the line-implicit 
relaxation scheme. For each example, the property of Eq. 14 has 
been verified to machine accuracy, and the asymptotic convergence 
rates for density (and turbulence where appropriate) are shown to 
be equal to that of the corresponding adjoint equations. For visual 
clarity, the convergence histories of the momentum and energy 
equations and their adjoint counterparts are not shown in the figures 
that follow, but have been found to closely follow that of the den- 
sity equation. The cost function for each test case is based on lift, 
and the CFL number for the flow solution has been linearly ramped 
from 10 to 200 over the first 50 iterations. Unless otherwise stated, 
each of the computations shown here has been performed on six- 
teen 2.2 GHz Pentium IV processors. 

Inviscid Wing 

The first test case is inviscid flow, with a freestream Mach num- 
ber equal to 0.84 and an angle of attack of 3.06® . The grid for this 
test contains 357,900 nodes and 2,000,034 tetrahedra. The conver- 
gence histories of the density equation for the flow and adjoint so- 
lutions are shown in Fig. 8; the convergence rates are equivalent. 



Figure 9. Density and turbulence residual histories for turbulent 
flow over ONERA M6 wing. 



Figure 10. Convergence of lift and lift derivatives for turbulent flow 
over ONERA M6 wing. 

Laminar Wing 

The first viscous case is for laminar flow with a freestream Mach 
number of 0.3 , an angle of attack of 2® , and a Reynolds number of 
1000 based on the mean aerodynamic chord. The grid shown in 
Fig. 1 is used for this example. The density convergence histories 
for the flow and adjoint solutions are shown in Fig. 8, and the con- 
vergence rates are identical. 

Turbulent Wing 

The final validation case is for turbulent flow over the ONERA 
M6 wing on the same grid that was used in the previous example. 
For this case, the freestream Mach number is 0.84, the angle of at- 
tack is 3.06® , and the Reynolds number is 5 million based on the 
mean aerodynamic chord. The convergence histories for the flow 
and adjoint equations shown in Fig. 9 exhibit similar asymptotic 
rates. 

To further demonstrate the asymptotic nature of the flow and ad- 
joint solution algorithms, the turbulent flow test case is also used to 
examine the convergence of a cost function and its derivatives. For 
this test, the wing has been parameterized by using the method of 
Ref. 50. Fig. 10 shows the error in the lift coefficient as the flow so- 
lution converges; it also shows the error in the derivatives of lift 
with respect to freestream Mach number, angle of attack, and sev- 
eral shape design variables at the midspan location as the adjoint 
solution converges. For this case, the error is defined as the differ- 


6 





Table 1. Memory and CPU requirements for flow solver and 
various linearization storage strategies used for computing adjoint 
residual. 


Solver 

Memory 

(GB)' 

Wallclock Time 
(Hours) 

Flow Solver 

1.5 

50.4 

Adjoint Solver: 



Explicit computation of all 
residual terms 

1.6 

42.0 

Explicit computation of inviscid 
residual terms; viscous, turbulent 
residual terms stored 

3.1 

25.9 

All residual terms stored 

10.8 

17.1 


ence between the current value and the final converged result. The 
errors are reduced at a similar rate for each computation. 

The turbulent flow test case is also used to evaluate the lineariza- 
tion storage strategies described above. Tests are performed on six- 
teen 250 MHz R 10000 processors of a Silicon Graphics Ongin 
2000 system to simplify memory monitoring and to eliminate the 
effects of network traffic. The turbulent flow validation case has 
been repeated using each of the storage strategies; memory and 
wallclock statistics are shown in Table 1 . 

Recomputing all of the necessary linearizations for the adjoint 
residual vields a memory requirement roughly equivalent to that of 
the flow^solution, while the wallclock time is approximately 17% 
less. The discrepancy in solution times can be attributed to several 
factors Forming the flux Jacobians for the flow solution is roughly 
equivalent in cost to a residual evaluation for the adjoint equations, 
with a slight penalty for the computation of the second-order invis- 
cid terms required by the latter. However, in the current implemen- 
tation these second-order terms merely require an application of the 
chain rule during computation of the first-order inviscid lineariza- 
tions. Note that the flux Jacobians used for the left-hand side are 
fixed during an adjoint solution, so that the initial cost of forming 
these terms and performing the block-decompositions and block- 
tridiagonal decompositions a priori is amortized over the duration 
of the computation. In addition to the flux Jacobians required at 
each time step of a flow solution, a residual evaluation is necessary, 
which requires updated solution gradients and other soludon-de- 
pendent terms such as edge weights. Finally, the flow solution also 
incurs a small amount of overhead at each time step in the computa- 
tion of current forces and moments. 

Alternatively, by storing the nearest-neighbor adjoint residual 
contributions that arise from the linearization of the viscous and 
turbulent terms, the wallclock time for an adjoint solution can be re- 
duced to approximately half that of the flow solution while dou- 
bling the memory requirement. Finally, by storing the complete lin- 
earization of the second-order residual, the computational time is 
reduced to roughly one-third of the baseline flow solution; how- 
ever, the memory requirement for this approach is a factor of 6 
higher than the baseline flow solution algorithm. Based on these re- 
sults, the viscous and turbulent terms are stored throughout the re- 
mainder of the current work, and the inviscid contributions are re- 
computed as needed. 

Large-Scale Test Cases 

Two large-scale test cases are used to evaluate the solution algo- 
rithm on realistic configurations. Each of the grids has been gener- 
ated by using the method described in Ref. 51. In the interest of 
comparing asymptotic convergence behavior, solutions have been 
converged considerably beyond the usual requirements for engi- 
neering-level answers; the stated run times do not represent time re- 
quired to obtain a sufficiently accurate solution. 



Figure 11. Surface grid for high-lifl configuration. 


High*Lift Configuration . tr 

furbulent flow over the high -lift configuration shown in Fig. 

1 1 IS computed using eighteen 1.7 GHz Pentium IV processors. The 
grid contains 846,863 nodes and 4,879,086 cells, and the implicit 
line construction algorithm places 418,523 nodes in the line-im- 
plicit region. Although only a single plane is shown, symmetry 
plane boundary conditions are used on both sides of the configura- 
tion. The fieestream Mach number is 0.2. the angle of attack is 
i:"" , and the Reynolds number is 9 million based on the stowed 
chord length. The CFL numbers for the flow equations and turbu- 
lence model are linearly ramped from 10 to 200 and 1 to 20, respec- 
tively, over the first 200 iterations. For this case, the objective func- 
tion is based on the lift coefficient. The flow solver requires 
approximately 3.5 GB of memoiy and 36.2 hours of wallclock time 
for the current example; the adjoint solver requires roughly 7. 1 GB 
ard 19 2 hours. The convergence histories for the flow and adjoint 
solutions are shown in Fig. 12; the asymptotic rates are similar. 


Modern Transport Configuration 

For this test, transonic flow over the modem transport configura- 
tion shown in Fig. 13 is considered. The grid for this case contains 
1 731 262 nodes and 10,197,838 cells. The implicit line construc- 
tion algorithm places 1.220,567 nodes in the line relaxation region. 
The fieestream Mach number for this case is 0.84, the angle of at- 
tack is 2.25° , and the Reynolds number is 3 million based on the 
mean aerodynamic chord. The computations shown here are per- 
formed on twenty-three 2.4 GHz Pentium IV processors, and the 
cost function is based on the drag coefficient. 
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1000 2000 3000 

Iteration 


Figure 12. Density and turbulence residual histories for turbulent 
flow over high-lift configuration. 



Figure 13. Surface grid for modem transport configuration. 



Iteration 

Figure 14. Density and turbulence residual histories for turbulent 
flow over modem transport configuration. 


The convergence histories for the flow and adjoint solutions are 
shown in Fig. 14. The CFL numbers for the flow equations and tur- 
bulence model are linearly ramped from 10 to 200 and 1 to 20, re- 
spectively, over the first 200 iterations. The CFL number used for 
the turbulence model is increased to 200 after the first 1000 itera- 
tions. As shown in Fig, 14, the asymptotic rates for both computa- 
tions are similar. For this example, the flow solution requires 7.4 
GB of memory and 41.8 hours of wallclock time; the adjoint solu- 
tion requires 15.0 GB of storage and 32 wallclock hours. 

Summary and Future Work 

An adjoint solution algorithm that preserves discrete duality for 
turbulent flows on unstructured grids has been developed and im- 
plemented. The scheme is based on a backward-Euler discretization 
of the Reynolds-averaged Navier-Stokes equations with a tightly 
coupled one-equation turbulence model, where the linear problem 
at each time step is solved with a line-implicit relaxation in the 
boundary layer and a point-implicit technique through the remain- 
der of the domain. 

Results for several cases show that the exact dual scheme 
achieves asymptotic convergence behavior equivalent to that of the 
flow problem. By storing the viscous and turbulent contributions to 
the adjoint residual, the memory required for the algorithm is ap- 
proximately twice that of the flow solution, but with execution 
times roughly 25—50% less for an equivalent number of iterations. 

Efforts are currently underway to develop a multigrid implemen- 
tation for improved convergence rates and to extend the discretiza- 
tion to include mixed-element grids. Real gas physical models from 
existing hypersonic structured-pid solvers are also being added, 
with the long-term goal of achieving a self-adaptive analysis and 
design capability valid across the speed range.^^ 
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Appendix: Derivation of Exact Dual Outer Iteration 

In this appendix, the expression given by Eq. 15 is manipulated 
io give the exact dual iterative scheme as shown in Eq. 17. The der- 
ivation shown here is identical to that of Ref. 18. 

Using the discrete form of integration by parts, namely 

v-i 

= a>^'b^-a0b0-^(a"^>-a'‘)b'’, (25) 


Eq. 15 can be expanded as 



Rearranging terms and applying the initial condition 
[dQydof = 0 gives 
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By applying the condition ^Q*) variable sub- 

stitution shown in Eq. 16, the exact dual iteration takes the form of 
Eq. 17 through the elimination of the terms involving ^gV^Z> . 
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